8 research outputs found

    Assessing Species Diversity Using Metavirome Data: Methods and Challenges

    Get PDF
    Assessing biodiversity is an important step in the study of microbial ecology associated with a given environment. Multiple indices have been used to quantify species diversity, which is a key biodiversity measure. Measuring species diversity of viruses in different environments remains a challenge relative to measuring the diversity of other microbial communities. Metagenomics has played an important role in elucidating viral diversity by conducting metavirome studies; however, metavirome data are of high complexity requiring robust data preprocessing and analysis methods. In this review, existing bioinformatics methods for measuring species diversity using metavirome data are categorised broadly as either sequence similarity-dependent methods or sequence similarity-independent methods. The former includes a comparison of DNA fragments or assemblies generated in the experiment against reference databases for quantifying species diversity, whereas estimates from the latter are independent of the knowledge of existing sequence data. Current methods and tools are discussed in detail, including their applications and limitations. Drawbacks of the state-of-the-art method are demonstrated through results from a simulation. In addition, alternative approaches are proposed to overcome the challenges in estimating species diversity measures using metavirome data.DH is fully supported by the PhD scholarships of The University of Melbourne. This work is also supported by Australian Research Council grant LP140100670 and the industry partner YourGeneBioScience

    CoMet: A workflow using contig coverage and composition for binning a metagenomic sample with high precision

    Get PDF
    Background: In metagenomics, the separation of nucleotide sequences belonging to an individual or closely matched populations is termed binning. Binning helps the evaluation of underlying microbial population structure as well as the recovery of individual genomes from a sample of uncultivable microbial organisms. Both supervised and unsupervised learning methods have been employed in binning; however, characterizing a metagenomic sample containing multiple strains remains a significant challenge. In this study, we designed and implemented a new workflow, Coverage and composition based binning of Metagenomes (CoMet), for binning contigs in a single metagenomic sample. CoMet utilizes coverage values and the compositional features of metagenomic contigs. The binning strategy in CoMet includes the initial grouping of contigs in guanine-cytosine (GC) content-coverage space and refinement of bins in tetranucleotide frequencies space in a purely unsupervised manner. With CoMet, the clustering algorithm DBSCAN is employed for binning contigs. The performances of CoMet were compared against four existing approaches for binning a single metagenomic sample, including MaxBin, Metawatt, MyCC (default) and MyCC (coverage) using multiple datasets including a sample comprised of multiple strains. Results: Binning methods based on both compositional features and coverages of contigs had higher performances than the method which is based only on compositional features of contigs. CoMet yielded higher or comparable precision in comparison to the existing binning methods on benchmark datasets of varying complexities. MyCC (coverage) had the highest ranking score in F1-score. However, the performances of CoMet were higher than MyCC (coverage) on the dataset containing multiple strains. Furthermore, CoMet recovered contigs of more species and was 18 - 39% higher in precision than the compared existing methods in discriminating species from the sample of multiple strains. CoMet resulted in higher precision than MyCC (default) and MyCC (coverage) on a real metagenome. Conclusions: The approach proposed with CoMet for binning contigs, improves the precision of binning while characterizing more species in a single metagenomic sample and in a sample containing multiple strains. The F1-scores obtained from different binning strategies vary with different datasets; however, CoMet yields the highest F1-score with a sample comprised of multiple strains

    ENVirT: inference of ecological characteristics of viruses from metagenomic data

    Get PDF
    Background Estimating the parameters that describe the ecology of viruses,particularly those that are novel, can be made possible using metagenomic approaches. However, the best-performing existing methods require databases to first estimate an average genome length of a viral community before being able to estimate other parameters, such as viral richness. Although this approach has been widely used, it can adversely skew results since the majority of viruses are yet to be catalogued in databases. Results In this paper, we present ENVirT, a method for estimating the richness of novel viral mixtures, and for the first time we also show that it is possible to simultaneously estimate the average genome length without a priori information. This is shown to be a significant improvement over database-dependent methods, since we can now robustly analyze samples that may include novel viral types under-represented in current databases. We demonstrate that the viral richness estimates produced by ENVirT are several orders of magnitude higher in accuracy than the estimates produced by existing methods named PHACCS and CatchAll when benchmarked against simulated data. We repeated the analysis of 20 metavirome samples using ENVirT, which produced results in close agreement with complementary in virto analyses. Conclusions These insights were previously not captured by existing computational methods. As such, ENVirT is shown to be an essential tool for enhancing our understanding of novel viral populations.This work was supported partially by Australia Research Council [grant numbers LP140100670 and DP150103512] and the Biodiversity Research Center, Academia Sinica, Taiwan. DJ, DH, DS and YS were funded by the MIFRS and MIRS scholarships of The University of Melbourne. Publication costs were funded by The Australian National University

    Reactive oxygen species and male reproductive hormones

    Get PDF
    Reports of the increasing incidence of male infertility paired with decreasing semen quality have triggered studies on the effects of lifestyle and environmental factors on the male reproductive potential. There are numerous exogenous and endogenous factors that are able to induce excessive production of reactive oxygen species (ROS) beyond that of cellular antioxidant capacity, thus causing oxidative stress. In turn, oxidative stress negatively affects male reproductive functions and may induce infertility either directly or indirectly by affecting the hypothalamus-pituitary-gonadal (HPG) axis and/or disrupting its crosstalk with other hormonal axes. This review discusses the important exogenous and endogenous factors leading to the generation of ROS in different parts of the male reproductive tract. It also highlights the negative impact of oxidative stress on the regulation and cross-talk between the reproductive hormones. It further describes the mechanism of ROS-induced derangement of male reproductive hormonal profiles that could ultimately lead to male infertility. An understanding of the disruptive effects of ROS on male reproductive hormones would encourage further investigations directed towards the prevention of ROS-mediated hormonal imbalances, which in turn could help in the management of male infertility

    Methods for profling heterogeneous sequencing data

    Get PDF
    © 2019 Damayanthi Kumari Herath Herath MudiyanselageMetagenomics which utilises high throughput DNA sequencing is widely applied to study bacteria and viruses and their effects on their host environments. Metagenomics involves collective sequencing of genetic material of the species in an environmental sample, subsequently requiring robust methods to elucidate the characteristics of the species in the sample from the heterogeneous data. A key step in learning the taxonomic diversity of a metagenomic sample is binning. Binning refers to grouping the nucleotide sequences belonging to an individual or closely related species. Identification of appropriate features and machine learning methods is essential in binning a metagenome of many unknown genomes. A significant challenge in binning metagenomic sequences is to bin a sample of closely related species. The thesis addresses this challenge and proposes a new two-tiered workflow called Coverage and composition based binning of Metagenomes (CoMet) for binning assembled sequences (contigs) of a metagenomic sample. It is demonstrated that a combination of features coupled with appropriate unsupervised learning methods can improve the precision in binning while enabling characterization of more species in a metagenome of species with similar genetic variants. Species richness is a key species diversity measure which corresponds to the number of species in an environmental sample. Estimating species richness of a metagenome of viruses (i.e. a virome) based on the reference data is challenging because of the limited amount of sequence data of viruses available in reference databases. A limitation identified with the methods that do not rely on reference sequence data in estimating species richness is the assumption of equal genome length for all the species in the sample. The thesis addresses this limitation by proposing a method to estimate species richness from a virome considering the variability of the genome lengths of species in the sample. The proposed method enables inference of genome lengths distribution from the metagenomic sequence data in addition to estimating the species richness. RNA-Seq refers to a set of techniques enabling the effective study of the transcriptome. An application of RNA-Seq is differential transcript usage analysis (DTU) which refers to inferring differences in expressions of multiple transcripts (isoforms) of a gene across different conditions from the sequencing data generated in an experiment. A key step in RNA-Seq data analysis is aligning the sequence reads to a reference sequence. SuperTranscripts is an alternate reference sequence proposed mainly for analyzing organisms with no/incomplete reference sequences. The thesis explores the use of superTranscripts to test for DTU in organisms with good reference sequences and annotations. Three definitions of counting-bins based on superTranscripts which are further used to infer DTU in genes are considered. The results with simulated data of fruit fly and human demonstrate that superTranscripts enable the analysis of DTU in genes with better control in False Discovery Rate (FDR) than the standard methods while not requiring the prior estimation of isoform abundances. The analysis of real data demonstrates the effectiveness of using superTranscripts to visualize the DTU in genes

    Semantic Segmentation using Vision Transformers: A survey

    Full text link
    Semantic segmentation has a broad range of applications in a variety of domains including land coverage analysis, autonomous driving, and medical image analysis. Convolutional neural networks (CNN) and Vision Transformers (ViTs) provide the architecture models for semantic segmentation. Even though ViTs have proven success in image classification, they cannot be directly applied to dense prediction tasks such as image segmentation and object detection since ViT is not a general purpose backbone due to its patch partitioning scheme. In this survey, we discuss some of the different ViT architectures that can be used for semantic segmentation and how their evolution managed the above-stated challenge. The rise of ViT and its performance with a high success rate motivated the community to slowly replace the traditional convolutional neural networks in various computer vision tasks. This survey aims to review and compare the performances of ViT architectures designed for semantic segmentation using benchmarking datasets. This will be worthwhile for the community to yield knowledge regarding the implementations carried out in semantic segmentation and to discover more efficient methodologies using ViTs.Comment: 35 pages, 13 figures, 2 table

    Additional file 1 of CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision

    No full text
    Supplementary Material file contains the details of recall values obtained in this study and the GC content distributions and the GC content - log(Coverage) distributions of the contigs in the simulated datasets considered in this study. (PDF 203 kb
    corecore